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Abstract 

The spatial clustering of points from two or more classes (or species) has important implications in many 
fields and may cause the spatial patterns of segregation and association, which are two major types of spa- 
tial interaction between the classes. The null patterns we consider are random labeling (RL) and complete 
spatial randomness (CSR) of points from two or more classes, which is called CSR independence, henceforth. 
The segregation and association patterns can be studied using a nearest neighbor contingency table (NNCT) 
which is constructed using the frequencies of nearest neighbor (NN) types in a contingency table. Among 
NNCT-tests (i.e., tests based on NNCTs), Pielou's test is equivalent to the usual (Pearson's) test of inde- 
pendence for contingency tables, but is liberal under CSR independence or RL patterns. On the other hand, 
Dixon's test of segregation has the desired significance level under the RL pattern. We propose three new 
multivariate clustering tests based on NNCTs using the appropriate sampling distribution of the cell counts 
in a NNCT and suggest a simple correction for Pielou's test for data with rectangular support. We compare 
the finite sample performance of these new tests with Pielou's and Dixon's tests and Cuzick & Edward's 
fc-NN tests in terms of empirical size under the null cases and empirical power under various segregation 
and association alternatives and provide guidelines for using the tests in practice. We demonstrate that the 
newly proposed NNCT-tests perform relatively well compared to their competitors and illustrate the tests 
using three example data sets. Furthermore, we compare the NNCT-tests with the second-order methods 
such as Ripley's L-function and pair correlation function using these examples. 

Keywords: Association; spatial clustering; complete spatial randomness; independence; nearest neighbor methods; ran- 
dom labeling; second-order analysis; spatial pattern 



1 Introduction 



Spatial point patterns have important implications in various fields such as epidemiology, population biology, and ecology. 
It is of practical inte rest to i nvesti g ate the pattern of one class not only with respect to the ground but also with respect 
to the other classes (jPieloul (|l96ll ). IWhipplel (|l980l ). and lDixonI (|l994 l2002al )). For convenience and generality, we refer 
to the different types of points as "classes" , but the class can stand for any characteristic of an observatio n or a point at a 
particular location. For example, the spa t ial se gregation patte rn has been investigated fo r plant species (|Digglel (|2003l )). 
age classes of plants (|Hamill and Wrightl (|l986l )'). fish species (|Herler and Patzneif (|2005l )'). and sexes of dioecious plants 
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Nanami et all ll 19991) '). Many of the epidemiological applications are for a two-class system of case and control labels 
Waller and Gotwavl (|2004h ). 



Many tests of spatial segre gation have b een developed in literature (|Ortonl (| 19821 )). These include compa rison 

of Ripley's K- or L - funct ions jRiplevI (|2004l )). comparison of nearest neighbor (NN) distances |Diggl^ (|2003|) and 

ICuzick and Edwardsl |l990)), and analysis of nearest neighbor contingen cy ta bles fNNCTs) (|Pieloul (|l96ll ) and lMeagher and Burdic^ 
( 1980l )). NNCTs are constructed using the NN frequencies of classes. iKuUdorff (.200^ provides an extensive review of 
tests of spatial randomness that adjust for an inhomogeneity of the densities of the underlying populations. In the 
presence of numerous tests available, one advantage of NNCT-tests (i.e., tests based on NNCTs) is a theoretical one: 
their asymptotic distributions are available. However, given the available computational tools, it might be a marginal 
advantage in practice. Most of the tests in literature use Monte Carlo simulation or randomization tests (|Kulldorfil 
l|200d )). In fact, our NNCT-tests could also employ these simulation methods. A major consideration is the power of 
the newly proposed NNCT-tests in comparison with the currently available tests. For better understanding of the power 
performance, it is desirable to know the finite sample moments and empirical size performance of the tests under the null 
cases. Usually, the tests of spatial randomness involve a parameter, which forces the user to resort to and then adjust 
for multiple testing. The NNCT-tests combine four tests in 2 x 2 case, and tests in q x q case, so by construction 
avoid the problem of multiple testing. Furthermore, they are potentially more powerful compared to other NN tests 
which use less of the information provided. The effects of homogeneity or lack of it and how non-st ationarity aff e cts th e 
results of spatial pattern tests have recently been discussed in some deta il in the ecological context (|Perrv et al.l (|2006l )) 
and for other methods (e.g., Ripley's J<"-function: iBaddelev et al.l (|2000l )). Some of the tests for spatial randomness are 
designed only for homogeneous populations, while the tests we consider adjust for any inhomogeneity of the data, in 
the sense that, these tests are appropriate for both homogeneous and inhomogeneous populations. For the q-class case 
with q > 2 classes, the overall segregation tests provide information on the (small-scale) multivariate spatial interaction 
in one compound summary measure; while the Ripley's L-function or pair correlation function requires performing all 
bivariate spatial interaction an alysis. When th e overall test is significant, one can also use the cell-specific NNCT-tests 
for pairwise post hoc analysis (jCevhanl (|2008al )). 

In this article, for simplicity, we describe the spatial point patterns for two classes only; the extension to multi-class 
case is straightforward. We consider two major types of spatial clustering patterns, namely association and segregation. 
The null pattern is usually one of the two (random) pattern types: complete spatial randomness (CSR) or random 
labeling (RL). The NNCT-tests do not suffer from the problem of incorrect specification of the null hypothesis (CSR 
independence or RL), since they are for testing a more refined null hypothesis: "the randomness in the NN structure". 
In this article, we introduce correction (for dependence in cell counts) strategies for Pielou's test. The first type of 
correction methods is derived analytically based on the correct distribution of the cell counts under CSR independence 
or RL, while the second type is based on Monte Carlo simulations. We only consider completely mapped data, i.e., the 
locations of all events in a defined space are o bserved. There have been so me r eservat i ons o n the appropriateness of 
Pielou's tests for completely mapped data (see iMeagher and Burdickl (|l980l ) and lDixonI (|l994l )). Pielou's test assumes 
independence of cell counts in the NNCT, but these cell counts are dependent since it is more likely for a point to be a 
NN of its NN (reflexivity in NN structure). 



2 Null and Alternative Patterns 



The appropriate null pattern for NNCT-tests is Ho : randomness in the NN structure. This null hypothesis usually results 
from one of the two (random) pattern types: random labeling {RL) of a set of fixed points with two classes or complete 
spatial randomness (CSR) of points from two classes, which is called "CSR independence", henceforth. That is, when the 
points from each class are assumed to be uniformly distributed over the region of interest, then randomness in the NN 
structure is implied by the C SR independence patte r n. Th is CSR independence pattern is also referred to as "population 
independence" in literature (|Goreaud and Pelissied (|2003h ). Note that CSR independence is equivalent to the case that 
RL procedure is applied to a given set of points from a CSR pattern, in the sense that after points are generated uniformly 
in the region, the class labels are assigned randomly. When only the labeling of a set of fixed points (the allocation of 
the points could be regular, aggregated, or clustered, or of lattice type) is random, the null hypothesis is implied by RL 
pattern. The distinction between CSR independence and RL is very important when defini ng the appropriate null mode l 
for each empirical case; i.e., the null model depends on the particular ecological context. iGoreaud and Pelissieij (|2003l ) 
discuss the differences between these two null hypotheses and demonstrate that the misinterpretation is very common. 
They also propose some guidelines to help decide which null hypothesis is appropriate and when. They assert that under 
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CSR (independence) the (locations of the points from) two classes are a priori the result of different processes (e.g., 
individuals of different species or age cohorts), whereas under RL, some processes affect a posteriori the individuals of 
a single population (e.g., diseased versus non-diseased individuals of a single species). Notice also that although CSR 
independence and RL are not same, they lead to the same null model (i.e., randomness in NN structure) for tests using 
NNCT, which does not require spatially-explicit information. In general CSR refers to a univariate pattern and implies 
that the spatial distribution of points is random over the study area, but makes no assumption about the distribution 
of the labels within the set (e.g., points could conform to CSR and the marks or labels could also be segregated). 
To emphasize the distinction between univariate and multivariate CSR patterns, the univariate pattern is called just 
"CSR", while the multivariate CSR pattern is called "CSR independence". Hence in this article, CSR independence 
pattern refers to the spatial distribution of points from each of the classes is random and uniform over the region of 
interest. RL suggests that different labels are assigned to the points at random, but makes no assumption about the 
spatial arrangement of points (e.g., points could be spatially clumped, segregated or associated). 

We consider two major types of (bivariate) spatial clustering patterns of association and segregation as alternative 
patterns. Association occurs if the NN of an individual is more likely to be from another class. For example, in plant 
biology, the two classes of points might represent the coordinates of mutualistic plant species, so the species depend on 
each other to survive. As another example, one class of points might be the geometric coordinates of parasitic plants 
exploiting the other plant whose coordinates are of the other class. In epidemiology, one class of points might be the 
geographical coordinates of contaminant sources, such as a nuclear reactor or a factory emitting toxic waste, and the 
other class of points might be the coordinates of the residences of cases (i.e., incidences) of certain diseases, e.g., some 
type of cancer caused by the contaminant. Segregation occurs if the NN of an individual is more likely t o be of the sam e 
class as the individual; i.e., the members of the same class tend to be clumped or clustered (see, e.g., iPieloul (|l96ll )). 
For instance, one type of plant might not grow well around another type of plant and vice versa. In plant biology, one 
class of points might represent the coordinates of trees from a species with large canopy, so that other plants (whose 
coord in ates are the po i nts fr om the other class) that need light cannot grow around these trees. See, for instance, (|Dixonl 
(|l994 ). [Coomes et al.l (|l999l )) for more detail. The segregation and association patterns are not symmetric in the sense 
that, when two classes are segregated, they do not necessarily exhibit the same degree of segregation or when two classes 
are associated, one class could be more associated with the other. Many different forms of segregation (and association) 
are possible. Although it is not possi ble to li s t all t ypes of segregation, its existence can be tested by an analysis of the 
NN relationships between the classes jPieloul (|l96lh ). Both patterns might result from differences between first-order or 
second-order stationarity of the two classes or species. In general departures from first-order homogeneity are likely to 
be important drivers of segregation; and association would usually result from the second order effects. However, when 
discussing the various examples (in Section [T} we refrain from mentioning any causality, which is not necessarily implied 
by these tests. 

We describe the construction of NNCTs in Section 13.11 provide NNCT-tests in Section 13.21 Pielou's and Dixon's 
tests in Sections 13.31 and 13.41 respectively, and the new versions of segregation tests in Section [3.51 other tests of spatial 
clustering in Section |3J empirical significance levels of the tests in Section O empirical power analysis in Section [51 three 
illustrative examples in Section [T] and discussion and conclusions with guidelines in using the tests in Section [S] 



3 Nearest Neighbor Contingency Tables and Related Tests 
3.1 Construction of the Nearest Neighbor Contingency Tables 

NNCTs are constructed using the NN frequencies of classes. We describe the construction of NNCTs for two classes; 
extension to multi-class case is straightforward. Consider two classes with labels {1, 2}. Let Nt be the number of points 
from class i for i G {1,2} and n be the total sample size, so n = A'^i -I- ■ If we record the class of each point and the 
class of its NN, the NN relationships fall into four distinct categories: (1, 1), (1, 2); (2, 1), (2, 2) where in cell (i, j), class 
i is the base class, while class j is the class of its NN. That is, the n points constitute n (base,NN) pairs. Then each pair 
can be categorized with respect to the base label (row categories) and NN label (column categories). Denoting Nij as 
the frequency of cell (i,j) for i,j G {1, 2}, we obtain the NNCT in Table[T]where Cj is the sum of column j; i.e., number 
of times class j points serve as NNs for j G {1, 2}. Furthermore, Nij is the cell count for cell {i,j) that is the count of 
all (base,NN) pairs each of which has label {i,j). Note also that n = ^ Nij; Ui = X^j=i ^-nd Cj = X]i=i 
By construction, if Nij is larger (smaller) than expected, then class j serves as NN more (less) to class i than expected, 
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which implies (lack of) segregation if i = j and (lack of) association of class j with class i if i 7^ j. Furthermore, we 
adopt the convention that variables denoted by upper (lower) case letters are random (fixed) quantities throughout the 
article. Hence, column sums and cell counts are random, while row sums and the overall sum are fixed quantities in a 
NNCT. 





NN class 






class 1 


class 2 


sum 


, , class 1 
base class 

class 2 


Nn 
N21 


N22 


ni 


sum 


Ci 


C2 


n 



Table 1: NNCT for two classes. 

Observe that, under segregation, the diagonal entries, i.e., Nu for i = 1,2, tend to be larger than expected; under 
association, the off-diagonals tend to be larger than expected. The general alternative is that some cell counts are 
different than expected under CSR independence or RL. 



3.2 A Review of NNCT-Tests in Literature 



IPieloul |l963) proposed tests (for segregation, symmetry, niche specificity, etc.) a nd Dixon intr oduced an overall test of 
segregation, cell-, and class-specific t ests ba s ed on N NCTs for the two-class case (jPixonl l)l994h l and extended his tests 
to multi-class (or multi-species) case l|Dixonl (|2002al )). iPieloul (|l96ll ) used the usual Pearson's x^-test of independence for 
detecting the segregation of the two classes. Due to the ease in computation a nd interpretation, Pielou's test of segregation 
is frequently used for both completely mapped and sparsely sampled data (jMeagher and Burdickl (|l980l )): indeed it is 
more frequently used t han Dixon's te st. F or example, Pielou's te s t is u sed for the segregation of males and females in 
dioeci ous specie s (e.g. , iHerreral (|l988h and [Armstrong and Irving (|l989l ')'). and of different species ([Good and Whipple! 
(|l982h l. iDixonI |l99J) points out two problems with Pielou's test: (i) it fails to identify certain types of segregation 
(e.g., mother-daughter processes) and (ii) the sampling distribution of cell counts is not appropriate. The assumption 
for the use of chi-square test for NNCTs is the independence between cell-counts (hence rows and colu mns), which 
is violated for NNCTs based on the two classes from CSR independence or RL patterns. iDixonI (Il994ll derived the 
appropriate (asymptotic) sampling distribution of ce ll counts usin g Moran join count statistics ( MoranI lll948l)) and hence 



the ap propriate test which also has a ^^-distribution (|Dixonl (|l994l ')V Problem (ii) was first noted by Meagher and Burdickl 
(|1980D who identify the main cause of it to be refiexivity of (base, NN) pairs. A pair of points is reflexive if each point 
in the pair is the NN of the other point in the pair, regar dless of the cl ass of the two individuals. As an alternative, they 
suggest using Monte Carlo simulations for Pielou's test. iDixonI (| 19941 ) also argues that Pielou's test is not appropriate 
for completely mapped data, but is appropriate for sparsely sampled data. 

For the two-class case, ICevhanI (|2006l ) compared these tests, and in addition to the previously mentioned reserva- 
tions on the use of Pielou's tests for completely mapped data, demonstrated that Pielou's tests are liberal under CSR 
independence and RL and are only appropriate for a random sample of (base,NN) pairs. 



3.3 Pielou's Test of Segregation 

In t he two- c lass c ase, Pielou used Pearsoi 
RL jPieloul (|l96ll) '). The test statistic is 



In t he two- c lass c ase, Pielou used Pearson's x^-test of independence to detect any deviation from CSR independence or 



z — 1 J — 1 

When NNCT is based on a random sample of (base,NN) pairs, in Equation (U]), Ep[A^ij] = riiCj/n and Cj is the sum 
for column j; and Xp is approximately distributed as Xi (i-6., X^ distribution with 1 degrees of freedom) for large n^. 
Rejecting Ho : independence of cell counts for large values of Xp (for Xp > xi(l ~ q)) the (1 — a) quantile of Xi 
distribution) yiel ds a cons i stent test. But, under CSR independence or RL, this test is liberal; i.e., it has larger size than 
the desired level (|Cevhanl (|2006l) '). 



4 



3.4 Dixon's NNCT-Tests 



Dixon proposed a series of tests for segregation based on NNCTs (|Dixonl (|l994l ll. For Dixon's tests, the probability of 
an individual from class j serving as a NN of an individual from class i depends only on the class sizes (i.e., row sums), 
but not the total number of times class j serves as NNs (i.e., column sums). 



3.4.1 Dixon's Cell-Specific Tests 

The level of segregation is estimated by comparing the observed cell counts to the expected cell counts under RL of 
points tha t are fixed. D ixon demonstrates that under RL, one can write down the cell frequencies as Moran join count 
statistic s jMoran (Il948l')). H e then derives the means, variances, and covariances of the cell counts (frequencies) in a 



NNCT (|Dbconl(|l994l2002al ')') 

The null hypothesis of RL implies 

E[iV.,] = (2) 
l^rii nj/{n - 1) if i / j. 

Observe that the expected cell counts depend only on the size of each class (i.e., row sums), but not on column sums. 

The cell-specific test statistics suggested by Dixon are given by 

,n _ N.,, - E[iV, 



where 



{n + R) Pa + {2n - 2 R + Q)pui + (n^ - 3 n - Q + R)piiii - (npu)^ if i = j, 
npij + Qpiij + (n^ - 3n - Q + R)piijj - (npij)^ if i / j, 



with pxx, Pxxx, and Pxxxx are the probabilities that a randomly picked pair, triplet, or quartet of points, respectively, 
are the indicated classes and are given by 

_ rii {rii — 1) _ Ui rtj 

Vii — 7 7T~ ; Pij 



n (n — 1) ' n (n — 1) ' 

_ rij (rii — 1) {uj — 2) _ Uj (n^ — 1) nj 

n (n — 1) (n — 2) n (n — 1) (n — 2) 

_ Ui (rii — 1) rij {rij — 1) _ Ui {rn — 1) {rn — 2) (n^ — 3) 

^ n{n-l){n- 2) (n - 3) ' ^ n {n - 1) (n - 2) (n - 3) ' 

Furthermore, Q is the number of points with shared NNs, which occur when two or more points share a NN and 7? is 
twice the number of reflexive pairs. Then Q — 2 (Q2 + 3 Qa + 6 Q4, + 10 Qs + 15 Qe) where Qk is the number of points 
that serve as a NN to other points k times. One-sided and t wo-sided tests are possible for each cell using the 

asymptotic normal approximation of Z^j given in Equation (l3ll (iDixonI (Il994 )). The test in Equation (j3]) is the same as 



Dixon's Zaa when i = j — 1; same as Zbb when i = j = 2 (|Dixonl ( 19941 ')). Note also that in Equation ([3]) four different 
tests are defined as there are four cells and each is testing deviation from the null case for the respective cell. These four 
tests are combined and used in defining an overall test of segregation in Section [3.4.21 

Under CSR independence, the null hypothesis, the test statistics, and the variances are as in the RL case for the 
cell-specific tests, except that the variances are conditional on Q and R. 

Remark 3.1. The Status of Q and R under CSR Independence and RL: Note the difference in status of the 
variables Q and R under CSR independence and RL models. Under RL, Q and R are fixed quantities; while under 
CSR independence, they are random. The quantities given in Equations and all the quantities depending on 

these expectations also depend on Q and R. Hence these expressions are appropriate under the RL pattern. Under the 
CSR independence pattern, they are conditional variances and covariances obtained by conditioning on Q and R. The 
unconditional variances and covariances can be obtained by replacing Q and R with their expectations. 
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Unfortunately, given the difficulty of calculating the expectations of Q and R under CSR independence, it is reasonable 
and convenient to use test statistics employing the conditional variances and covariances even when assessing their 
behavior under CSR independence. Alternatively, one can estimate the expected values of Q and R empirically and 
substitute these estimates in the expressions. For example, for the homogeneous planar Poisson process (conditional on 
the sample size), we have E[Q/n] ~ .632786 and E[i?/n] « 0.621120. (estimated empirically based on 1000000 Monte 
Carlo simulations for various values of n on unit square). When Q and R are repl aced by 0.63 n and 0.62 n, respectively, 
we obtain the so-called QR-adjusted tests. However, as shown in ICevhanl (|2008bh , QR-adjustment does not improve on 
the unadjusted NNCT-tests. □ 



3.4.2 Dixon's Overall Test of Segregation 



Dixon's overall test of segregation tests the hypothesis that expected cell counts in the NNCT are as in Equation ([2]). In 
the two-class case, he calculates Zu = (Nu — E[A''ii])/-\/Var[A^ii] for bo t h i £ {1,2} and combines these test statistics 
into a statistic that is asymptotically distributed as X2 

under RL (jPixonl (| 1994 )1. Under RL, the suggested test statistic 

is given by 



Cd = Y'E"^Y 



iVii -E[iVii] 

N22 - E[iV22] 



Var[iVii] Cov[iVii,iV22] 
Cov[iVii,Ar22] Var[iV22] 



where E[A^ii] are as in Equation ([2}, 'Var[Nii] are as in Equation Q, and 

Cov[iVii, A''22] = {n^ - 3n - Q + R)p\i22 - Pii P22. 
Dixon's Cd statistic given in Equation ((6)1 can also be written as 



D\2 



Cd = 



iVii - E[iVii] 

iV22 - E[Ar22] 



(6) 



(7) 



where r = Cov[iVii , Ar22] / \/Var[iVii] Var[iV22] (|Dixonl (| 1994 )1. 



Under CSR independence, the expected values, variances and covariances are as in the RL case. However, the 
variance and covariance terms include Q and R which are random under CSR independence and fixed under RL. Hence 
Dixon's test statistic Cd asymptotically has a Xi-distribution under CSR independence conditional on Q and R. 



3.5 New Overall Segregation Tests Based on NNCTs 



First, we propose tests based on the correct sampling distribution of the cell counts in a NNCT under RL or CSR 
independence. Then, we suggest a transformation based on Monte Carlo simulations to correct for the effect of the 
dependence between the cell counts. Thereby, we adjust Pielou's test for location and scale to render it have the desired 
level under th e null case. In defining the new segregation or clustering tests, we follow a track similar to that of Dixon's 
(|Dixonl (| 1994 )1. For each cell, we define a new type of cell-specific test statistic, and then combine these four tests into 
one overall test. 



3.5.1 First Version of the New Segregation Tests 



Recall that in Equation ([T}, Ep[A^ij] — riiCj/n. Asymptotically, Xp has Xi-distribution only w hen the NNCT is based 
on a random sample of (base,NN) pairs, which is not the case under CSR independence or RL (|Cevhanl (|2006l l1. 



Similar to Dixon's cell-specific tests in Section [3.4. II we consider the following test statistics for cells in the NNCT 

Tl,=N,,-^. (8) 
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Under RL, row sums Ni = rii are fixed while column sums Cj are random quantities. Hence, conditional on Cj — Cj, 
T'j = Nij — Hi Cj/n; and let 

]\['. ^ ^ (Nij -m Cj/n) 

^ y^Ui Cj/n Cj/n 

then = 5]^^^ {NIjY- Under RL, we find that 



E 



n (n-1) " « ^ J- 



Notice that under RL, E [t/^] / which implies that E [N.^j] / 0. Furthermore, 

hm E T/J = <^ ' (11) 



where Vi is the probability that an individual is of class i and ni,nj oo means that min(ni,nj) oo. However, 
although, E [A'"/,] is not analytically tractable, lim„.,„j-.oo E [Nij'] = since ni Cj /-n? — > \fi^ ~ Vi which implies 
1 / \/ m Cj /n —> as ni, rij —> 00. 

Let Ni be the vector of A'^;^ values concatenated row-wise and let S/ be the variance-covariance matrix of Ni 
based on the correct sampling distribution of the cell counts. That is, Ej = (Cov \Nij,Nl)/\) where Cov |^A^/y , A^;f; j = 
Cov [Nij,Nki] with Cov [Nij,Nki] is as in Equation (gj if = (fc, /) and as in Equation if (i, j) — (1, 1) 

^Ui Cj Uk Cl 

and (fc,Z) = (2,2). Since E/ is not invertible, we use its generalized inverse, Ej (jSearld (|2006l )). Then the proposed test 
statistic for overall segregation is the quadratic form 

Xf = NiE7Ni (12) 

which asymptotically has a Xi distribution. 

The test statistic Xj can be obtained by adding a correction term to Xp. Recall that 

hence 

xf = NiEJNi = Ni(E7 - /a + /2)Ni = NJNi + Ni(E7 - /2)Ni = Xf, + A, 
where Ac = Ni(E7 — /2)Ni and I2 is the 2x2 identity matrix. Furthermore, E/ can be obtained from E in Equation 
((6| by multiplying E entry- wise with the matrix C^/ = ( ) . Since T/^ is conditional on Cj — Cj , segregation 

\ ^Ui Cj Uk Cl J 

test with Xj is conditional on the column sums. 

Under CSR independence, the expected values, variances, and covariances related to Xf are as in the RL case, 
except they are not only conditional on column sums (i.e., on Cj — Cj), but also conditional on Q and R. Hence Xf has 
asymptotically Xi distribution conditional on column sums, Q, and R under CSR independence. 



3.5.2 Second Version of the New Segregation Tests 



For large n, we have m Cj/n ~ nui nj, where Hj is the probability that a NN is of class j. Under CSR independence or 
RL, Vi — Ki for all i = 1, 2, then ni Cj /n Ki n Vi Vj for large n. This suggests the following test statistics for the four cells, 

T;y = 7V.,_^. (13) 



Let 



T^jii _ T'lj _ (Nij -mnj/n) 
^JniUj/n yjninj/n 
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Under RL, we find that 

{ "i ("i-") if j = 7 
n (n — 1) ' 

Hence 

hm ET// =<^ (16) 
Notice that under RL, E [t"] = E [r^^,] / which implies that E [N"] / 0. Furthermore, 



(n j — n) 



if i = j, 



TT it I 7. 



Note that the sum of the squares of Nlj does not equal Xj>. Let Nn be the vector of Nlj concatenated row- wise 
and let E/j be the variance-covariance matrix of Nn based on the correct sampling distribution of the cell counts. That 

is, E// — (Cov \NI/ , NI[]) where Gov iNil , nIi \ = Cov INi^ . Nl-i]. Since E// is not invertible, we use its 

L J Uj rife m 

generalized inverse EJ^. Then the proposed test statistic for overall segregation is the quadratic form 

X}j = NiiE7,Nii (18) 

which asymptotically has a xi distribution under RL. Note that E// can be obtained from E by multiplying E entry-wise 

n 



with the matrix Cli — ( ^ ) . This version of the segregation test is asymptotically equivalent to Dixon's test 



of segregation. 

Under CSR independence, the expectations, variances, and covariances related to Xfj are as in the RL case, except 
the variances and covariances are conditional on Q and R. Hence, the asymptotic xl distribution of Xfj is also conditional 
on Q and R. 



3.5.3 Third Version of the New Segregation Tests 



Among the first two versions we discussed so far, version I of the new tests is a conditional test (conditional on column 
sums), while version II of the new tests is asymptotically equivalent to Dixon's test, although different from it for finite 
samples. Furthermore, both Dixon's test and version II of the new tests incorporate only row sums (class sizes) in the 
NNCTs. 



Now, for the NNCT cells, we suggest the following test statistics which use both the row and column sums (i.e., 
number of times a class serves as NN) and are not conditional on the column sums; 

Tf/' = ('r^' :/ "/ /' (19) 



(n-l)-Cj 
(n-l) ^1 



if i = j, 
if i / j. 



Note that E T, 



»j J 



under RL, but the sum of the squares of T^j does not equal Xp. Let Nm be the vector of 



TI/^ values concatenated row-wise and let E/// be the variance-covariance matrix of Nm based on the correct sampling 



distribution of the cell counts. That is, Ej// 
on the pairs (i, j) and {l,m). 



(Cov \T"'T, 



nil ^ 



where Cov IT// ^ , Tfc/^l has the following forms based 



Case (1) For i = j, k — I, 



Cov [t!/',tU'\ = Cov 



= Cov[iV,,,Afcfe] 



(n-l) 



(n 



(n-k 



(n-l) 



l)cov[iV,„a]- ("' 



(n-l) 



Cov[Afcfc,CiJ H — 2 Cov[Ci,CfcJ. 



(n- 
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Case (2) For i = j, k ^ I, 



Co^\Tl",T^l'] =Cov 



(n-1) 



(n-1) 



(n-l) 



Case (3) For i ^ j, k — I, 



Gov 4", 7^^ 



Gov 



(n-1) 



(n-1) 



= Gov[iV.,,iVfefc] - ,V Gov[iV.,,Cfc] - -^^Gov[Arfefc,C,l + "'^"^ ^ Gov[Cj, a], 

(n — 1) ('^ ~ 1) l*^ ~ 1) 



Case (4) For i^j.,k^l, 



Coy \Tl'', Tip \ =Gov 



(n-1) 
= Gov[iV.,,iVfe,] - 



(n-1) 

^^Gov[iV.„C,] - ^-:^Gov[iV,„C,] + ^i^Gov[C„C,], 



where 

Cov[iV,, , Cfc] = Gov[iV,, , Wife + iVa,] = Gov[iV,, , N^^] + Gov[iV,, , iVa,], 

and 

G0V[C,, C] = Gov[iVl, + iV2j, iVl, + N2nr] 

= Gov[iVi,,7Vi,] + Gov[Ari,,iV2™] + Gov[iV2,,iVii] + Gov[iV2,, iV2,n]. (20) 



Since Tim is not invertible, we use its generalized inverse Tjjj. Then the proposed test statistic for overall segregation 
is the quadratic form 

Xhi = NiiiE7„Nii, (21) 

which asymptotically has a Xi distribution. 

Under CSR independence, the discussion related to and derivation of Xfjj are as in the RL case, however, the 
variance and covariance terms (hence the asymptotic distribution) are conditional on Q and R. 



3.5.4 Correcting Pielou's Test for CSR Independence Based on Monte Carlo Simulations 



For the null case, we simulate the CSR independence case only with classes 1 and 2 (i.e., X and Y) of sizes ni 
and n2, respectively. At each of Nmc ~ 10000 replicates, under Ho, we generate data for the pairs of (ni,n2) G 
{(10, 10), (10, 30), (10, 50), (30, 30), (30, 50), (50, 50), (100, 100), (200, 200)} points iid (independently and identically dis- 
tributed) from U{{0, 1) X (0, 1)), the uniform distribution on the unit square. These sample size combinations are chosen 
so that one can examine the influence of small and large samples, and similar and very different sample sizes on the tests. 
The corresponding test statistics are recorded at each Monte Carlo replication for each sample size combination. In the 
Appendix, in Figure (TT] the kernel density estimates for Pielou's test statistic and the density plot of the Xi-distribution 
are provided in order to make distributional comparisons. The histograms (not presented), follow the trend of a chi- 
square distribution, but need an adjustment for location and scale. Based on the means and variances of Pielou's test 
statistics for each sample size combination which are also provided in Table [T2l Using these statistics, we transform the 
Xp scores by adjusting on location and scaling as 



2 _ X^ + 0.013 
Xp,^o - 1,643 (22) 



so that the transformed statistic will be approximately distributed as xi- 
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By construction this Monte Carlo correction is only appropriate when the null pattern is the CSR independence 
with rectangular study regions. Furthermore, the location and scale adjustments are based on large sample mean and 
variance estimates of Piel ou's test of segregation for s imilar sample sizes. In fact we used large ni — 712 values in the 
Monte Carlo simulations. iMeagher and Burdickl (|l980l ) propose and illustrate calculating critical values of Pielou's test 
statist ic under RL using simulation. This is similar to our Monte Carlo test, but not identical: iMeagher and Burdickl 
(|1980D use a Monte-Carlo computation of the critical value while we use a Monte Carlo based moment adjustment, which 
is intended as a simple and quick fix for Pielou's test. Furthermore, a Monte Carlo hypothesis testing may not be easily 
applicable for the CSR independence pattern (e.g., when the region is very complicated), but a randomization test can 
easily be conducted for the RL pattern. 

Remark 3.2. For Dixon's test and the new versions of the NNCT-tests, such a correction for means and variances is 
not necessary, as we start with the correct sampling distribution of the cell counts. However, for small sample size 
combinations, the estimated variances of Cd and Xjj are smaller than 4 (not presented), while the means are around 2, 
which explains the slightly conservative nature of these tests for small samples. On the other hand, Xf and Xfjj have 
estimated means around 1, while their estimated variances are smaller than 2. For small samples, one can transform 
these tests to have the appropriate variance, while retaining their means. But, this seems to be not worth the effort. □ 

Remark 3.3. Extension of NNCT- Tests to Multi-Class Case: In Sections 13. 4. 2l and l3. 51 we describe the segregation 
tests for the two class case in which the corresponding NNCT is of dimension 2x2. For q classes with g > 2, the NNCT 
will be of dimension q x q. Pielou's test readily extends to q x q contingency tables, but it will still be inappropriate 
for use in q X q NNCTs. The Monte Carlo corrected version in Section 13.5.41 is designed for the two-class case for 
rectangular regions. For more classes, such a correction can be carried out in a similar fashion. The cell counts for the 
diagonal cells have asymptotic normality. For the off-di agonal cells, although the asymptotic normality is supported by 
extensive Monte Carlo simulation results ()Dixonl (|2002ar i'). it is not rigorously proven yet. Nevertheless, if the asymptotic 
normality held for all q^ cell counts in the NNCT, under RL, Dixon's test and version II of the new tests would have 
Xq(q-i) distribution, versions I and III of the new tests would have xlq^ip distribution asymptotically. Under CSR 
independence, these tests will have the corresponding asymptotic distributions conditional on Q and R. □ 



4 Other Tests of Spatial Clustering 

The re are many test s for spatial clustering of points from one class or multip l e class es in the literature (|Digglel (|2003l ) 
and iKuUdorfll (|2006l l). Among them ar e Ripley's K or L- functions jRiplevI (|2004h ). Dig gle's D-function which is a 
modified version of Riple v's ;^-function (iDiggld (120031) p. 1 31), pair correlation function (jStovan and StovanI (|l99"3)), 



the univariate J-function (Ivan Lieshout and BaddelevI (| 19961) ) and multiva riate J-function (|van Lieshout and BaddelevI 



(|l999h '). and many other first and second order tests fsee lPerrv et al.l (|2006l ) for a detailed review of spatial pattern tests 



in plant ecology) . There a re also spa t ial pa ttern tests that adjust for an inhomogeneity and are mos tly used for clustering 
of cases in epidemiology (iKuUdorfj (l2006t)) Among them are Cuzick and Edward's fc-NN tests (|Cuzick and Edwardsl 



199(t\). spatial scan stat istic of iKulldorfj jldQi ). Whittemore's test. Tang o's MEET Bes ag-Newel's R, Moran's / 



Song and KuUdorfj (|2003l ')'). An extensive survey of such tests is provided bv iKuUdorfj (|2006l ) 



Among the above clustering tests, univariate tests are not comparab le with NNCT-tests , Moran's I and Whittemore's 
tests are shown to perform poorly in detecting some kind of clustering (|Song and KuUdorfj (|2003f )) and most of the tests 
require Monte Carlo simulation or randomization methods to attach significance to the ir results. Hence we only consider 
Cuzick-Edward's fc-NN tests and their combined versions (|Cuzick and Edwardsl (|l990D '). and compare NNCT-tests with 
these tests in an extensive simulation study. We also compare NNCT-tests with Ripley's L-function, Diggle's D-function, 
and the pair correlation functions for the appropriate null hypotheses in the examples, as they are perhaps the most 
commonly used tests for spatial interaction at various scales, although they are based on Monte Carlo simulation. 

Cuzick-Edward's fc-NN test is defined as — Sid^ , where 

(1 if Zi is a case, 
(23) 
if Zi is a control, 

with Zi being the i*'' p oint and dj is the n umber of fc NNs which are cases. Since in practice, the correct choice of fc is 
not known in advance. ICuzick and Edward s (1990) also suggest combining information for various Tk values. Assuming 
multivariate normality of Tk values and Tk being a mixture of shifts all in the same direction under an alternative, the 
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combined test statistic is given by 

where 5 — {fci,fc2, • ■ • ,fcm} and T — {T^-^^^Tk^, . . . ,!):,„)' (i.e., T§°"^^ is the test obtained by combining Tk whose indices 
are in 5), 1' = (1, 1, ... , 1), E = Cov[T] is the variance-covariance matrix of T. Under Ho : RL of cases and controls 
to the given locations in the study region, Tk converges in law to A''(E[rfc], Var[rfe]/no); similarly, T§°"^^ converges in 
law to Ar(E[T|°'"''], Var[T^°'"'']) when number of c ases np goes to infinity . The expected values E[rfc] and E[r|°'"''] and 
variances Va.r[Tk] and Var[T|°'"''] are provided in (|Cuzick and Edwardsl (|l990l ')'). 

The computational order of Cuzick-Edward's fc-NN test is of O(n^), while if r distinct tests are combined it 
is O (r^n^) . Although theoretically, both versions are of the same order for fixed r, in practice it might make a big 
difference in computation time. In fact, the computation of for k — 1,2, ... ,5 for ni = 712 = 50, 10000 times took 
about a day, while X'f"™'' took about 10 days in an Intel Pentium 4 2.4 GHz with 1 GB memory and 40 GB storage. 



When the NNCT-tests and fc-NN tests indicate significant segregation or clustering, one might also be interested in 
the (possible) causes of the segregation and the type and level of interaction between the classes at different scales (i.e., 
inter-point distances). To answer such questions, we calculate Ripley's (univariate) L-function which is the modified 
version of if- function, which are denoted by Lii{t) and Kii{t) for class i, respectively. The univariate L- function is 



estimated as La (t) = y yKu (t) /tt j where t is the is the distance from a randomly chosen event and Ka (t) is an 
estimator of 

Kii{t) = A~^E[# of extra events within distance f of a randomly chosen event] (25) 
with A being the density (number per unit area) of events and is calculated as 

K^^{t) = A"' ^ ^ w{i, d,,)l{d,, < t)/N (26) 

where A = N/A is an estimate of density (A'^ is the observed number of points and A is the area of the study region), dij 
is the distance between points i and j, I(-) is the indicator function, w{i,dij) is the proportion of the circumference of 
the circle centered at h with radius dij that falls in the study area, which corrects for the boundary effects. Under CSR 
independence, Lu{t) — t = holds. If the univariate pattern exhibits aggregation, then Luit) — t tends to be positive; if 
it exhibits regularity then Luit) — t tends to be negative. See (jPigglQ (20031 ')') for more detail. 

We also provide Ripley's bivariate L-function, denoted by Lij{t) for classes i and j and estimated as Lij{t) — 



(^Kij{t) /iT^ where Kij{t) is an estimator of 

Kij(t) — \J^E[^ of extra type j events within distance f of a randomly chosen type i event] 
with Aj being the density of type j events and is calculated as 

k,j{t) = (A,AjA)"';^^«;(ifc,d,,,,JI(d,,,,, < t) (27) 



where dij.,j, is the distance between k*^ type i and type j points, w{ik, dii^j^) is the proportion of the circumference 
of the circle centered at k*^ type i point with radius ciij,,j, that falls in the study area, which is used for edge correction. 
Under CSR independence, Lij{t) — t = holds. If the bivariate pattern is the segregation of the classes or specie s, then 
Lij(t) tends to be negative, if it is association of the classes or species then Lij{t) tends to be positive. See (|Digglel 
(120031 )') for more detail. 



However, Ripley's jf-function is cumulative, so interpreting the spatial interaction at la rger distances is pro b lemat ic 
ijWiegand et al.1 (j200'if ) and ). The pair correlation function g{t) is better for this purpose (jStovan and StovanI (|l994l )). 
The pair correlation function of a (univariate) stationary point process is defined as 

Int 

where K'{t) is the derivative of K{t). For a univariate stationary Poisson process, g{t) — 1; values of g{t) < 1 suggest 
inhibition (or regularity) between points; and values of g{t) > 1 suggest clustering (or aggregation). The same definition 
of the pair correlation function can be applied to Ripley's bivariate K or i-functions as well. The benchmark value of 
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Kij{t) = TTt^ corresponds to g{t) = 1; g{t) < 1 suggests segregation of the classes; and g{t) > 1 suggests association of 
the classes. However the pair correlation function estimates might have critical behavior for small t if g{t) > since the 
estimator variance and he nce the bias are considerably large. This problem gets worse especially in cluster processes 
jStovan and StovarJ (| 19961 )). So pair correlation function analysis is more reliable for larger distances and it might be 
safer to use g{t) for distances larger than the average NN distance in the data set. 

When the null case is the RL of points from an inhomogeneous Poisson proce ss, Ripley 's K- o r L-functions in the 
genera l form are not appropriate to test for the spatial clustering of the cases (|Kulldorfll (|2006l )). However, iDiggld 
suggests a version based on Ripley's univariate A'- function as D{t) = Kii{t) — 7^22 (t). In this setup, "no spatial 
clustering" is equivalent to RL of cases and controls on the locations in the sample, which implies D{t) = 0, since 
K22(t) measures the degree of spatial aggregation of the controls (i.e., the population at risk), while Kii{t) measures 
this same spatial aggregation plus any additional clustering due to the disease. The test statistic D(t) is estimated by 
D{t) = A'ii(t) - K22{t), where Kti{t) is as in Equation (f26)) . 

Among the tests we will consider, NNCT-tests summarize the spatial interaction at the smaller scales (more specifi- 
cally, for distances about the average NN distance in the data set), Cuzick- Edward's fc-NN and combined tests provide 
information on spatial interaction for distances about the average fc-NN distance between the points. On the other hand, 
second order analysis by Ripley's K- or L-functions, Biggie's _D-function, and pair correlation function may provide 
information on the univariate or bivariate patterns at all scales (i.e., for all inter-point distances) of interest. 

The NNCT-tests are designed for RL of classes to a set of given points. For CSR independence of classes they are 
conditional on Q and R (essentially, on the location of the points up to scale). Cuzick-Edward's tests are designed to 
detect the clustering of cases in the presence of inhomogeneity in the locations of both cases and controls. Hence, they 
are appropriate for the RL of the points in a study area; and similar to NNCT-tests, they are conditional on the locations 
of the points under CSR independence. Both NNCT and Cuzick-Edward's tests appeal to asymptotic approximation of 
the test statistics, although Monte Carlo simulation or randomization versions for them are readily available. On the 
other hand, Ripley's K or L-functions and pair correlation functions are appropriate for the CSR independence null 
pattern, while Biggie's D-function is appropriate for either CSR independence or RL patterns. However, these tests are 
based on Monte Carlo simulation or randomization of points in the study area. 

The order of classes in the construction of the NNCTs is irrelevant for the values hence for the results of the NNCT- 
tests. However, by construction, Cuzick-Edward's tests are more sensitive for clustering of the cases in a case/control 
framework (or the first class that is treated as cases in Equation (|23p in the generalized two-class framework). That is, 
they are not symmetric for the classes, i.e., if one reverses the role of cases and controls in a data set, the test statistics 
might give different results. The order of the classes is inconsequential for Ripley's bivariate K or L-functions and pair 
correlation functions in theory, as they are symmetric in the classes they pertain to. But in practice edge corrections 
will render it slightly asymmetric, i.e., Lij(t) 7^ Ljiit) for i 7^ j. Biggie's D-function is dependent on the order of the 
classes up to a sign difference, in the sense that, if one switches the roles of the two classes, the calculated test statistics 
differ in sign only. 



5 Empirical Significance Levels under CSR Independence 

We generate points from two classes under Ho ■ CSR independence as in Section [33]4] At each sample size combination, 
we record how many times the p-value is at or below a = .05 for each test to estimate of the empirical size. We present 
the empirical sizes for NNCT-tests in Table [2j where Sp is the empirical significance level for Pielou's test, q_d is for 
Bixon's test, S/, an and am are for versions I, H, and HI of the new tests, respectively, and ap,mc is for the Monte 
Carlo corrected version of Pielou's test as in Equation (|22p . The empirical size estimates are also plotted against the 
sample size combinations in Figure [T] where the trend and performance of the tests are easier to detect. 

Observe that Pielou's test is extremely liberal in rejecting Ho and its empirical size is severely affected by the difference 
in the sample (or class) sizes. That is, when the sample sizes are very different (i.e.,(ni , 712) G {(10, 50), (50, 10)}), the 
empirical sizes are significantly smaller than those for the other sample size combinations. This seems to work in favor 
of Pielou's test when applied on NNCTs, as it is extremely liberal, and the difference in the sample sizes re duces its size 
significantly toward the nominal level. These results were also presented in more detail in (jCevhanl (120061 ) ): we include 
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them here in order to compare Pielou's test with the Monte Carlo corrected version A'p^^.. The NNCT-tests other than 
Pielou's test are about the desired level (or size) when ni and 712 are both > 30, and mostly conservative otherwise. 
However, in general if Pielou's test were at the desired level for similar sample sizes, it would have been extremely 
conservative for very different sample sizes. The Monte Carlo corrected version of Pielou's test, Xp „^^, has significantly 
smaller empirical sizes than the uncorrected one, Xp. However, it is extremely conservative when at least one sample is 
too small (i.e., < 30). But Dixon's test and Xfj are usually conservative when at least one sample is small (i.e., < 30), 
liberal for one case ((ni, 112) — (50, 100)), and are about the appropriate nominal level a for the rest of the sample size 
combinations. On the other hand, Xf is also conservative when at least one sample is small (i.e., < 30), liberal for small 
and equal sample sizes (i.e., (711,712) € {(10, 10), (30, 30)}), and about the nominal level for other cases. Finally, Xfjj 
is usually conservative when at least one sample is small (i.e., < 30), and is about the nominal level for other cases. 
Pielou's test is extremely liberal, while Monte Carlo corrected version is sporadic for smaller samples. Version I of the 
new tests is appropriate for large samples, but sporadic for small samples. Dixon's test reveals less fluctuation, and is 
more appropriate for larger samples. Versions II and HI of the new tests are conservative for small sample sizes, and 
about the desired level for large samples. 



Empirical significance levels of the NNCT-tests 


{ni,n2) 


ap 


OiD 


ai 


an 


am 


ap.mc 


(10,10) 


.1280* 


.0432= 


.0593*^ 


.0461= 


.0439= 


.0608* 


(10,30) 


.1429* 


.0440= 


.0451= 


.0421= 


.0410= 


.0320= 


(10,50) 


.0664* 


.0482 


.0335= 


.0423= 


.0397= 


.0292= 


(30,10) 


.1383* 


.0390= 


.0411= 


.0383= 


.0391= 


.0282= 


(30,30) 


.1339* 


.0464 


.0544* 


.0476 


.0427= 


.0552* 


(30,50) 


.1319* 


.0454= 


.0507 


.0481 


.0504 


.0484 


(50,10) 


.0654* 


.0529 


.0326= 


.0468 


.0379= 


.0287= 


(50,30) 


.1275* 


.0429= 


.0494 


.0468 


.0469 


.0477 


(50,50) 


.1397* 


.0508 


.0494 


.0497 


.0499 


.0494 


(50,100) 


.1223* 


.0560* 


.0501 


.0564* 


.0516 


.0499 


(100,50) 


.1190* 


.0483 


.0463= 


.0492 


.0479 


.0455= 


(100,100) 


.1324* 


.0504 


.0524 


.0519 


.0489 


.0524 



Table 2: The empirical significance levels for Pielou's, Dixon's, and the new overall NNCT-tests based on 10000 
Monte Carlo simulations of the CSR independence pattern, ap stands for the empirical significance level for 
Pielou's test, Sd for Dixon's test, aj, an and am for versions 1, II, and III of the new tests, respectively, 
and ap^mc for the Monte Carlo corrected version of Pielou's test as in Equation (|22p . (=: the empirical size is 
significantly smaller than .05; i.e., the test is conservative. *: the empirical size is significantly larger than .05; 
i.e., the test is liberal.) 



Empirical significance levels of Cuzick-Edward's fc-NN and combined tests 



("1,«2) 












;^com^> 
"1-2 


<^,co7nb 
"1-3 


"1-4 




(10,10) 


.0454= 


.0398= 


.0495 


.0474 


.0492 


.0478 


.0484 


.0477 


.0497 


(10,30) 


.0306= 


.0495 


.0400= 


.0458= 


.0418= 


.0334= 


.0434= 


.0434= 


.0432= 


(10,50) 


.0270= 


.0541* 


.0367= 


.0490 


.0529 


.0438= 


.0419= 


.0413= 


.0409= 


(30,10) 


.0479 


.0493 


.0458= 


.0497 


.0471 


.0462= 


.0464 


.0479 


.0488 


(30,30) 


.0507 


.0529 


.0480 


.0475 


.0479 


.0467 


.0452= 


.0455= 


.0477 


(30,50) 


.0590* 


.0416= 


.0429= 


.0485 


.0435= 


.0471 


.0478 


.0458= 


.0463= 


(50,10) 


.0524 


.0492 


.0474 


.0509 


.0548* 


.0507 


.0513 


.0511 


.0512 


(50,30) 


.0535 


.0483 


.0502 


.0485 


.0504 


.0489 


.0472 


.0477 


.0480 


(50,50) 


.0465 


.0490 


.0516 


.0545* 


.0514 


.0522 


.0509 


.0511 


.0514 



Table 3: The empirical significance levels for Cuzick-Edward's fc-NN tests Tk for k = 1,2, ...,5 and the 
combined tests Tg°"^^ for 5 = 1 — 2,1 — 3,1—4, and 1 — 5. stands for the empirical size for 

Cuzick-Edward's /c-NN test for k = 1,2,..., 5, and a™™** for the combined test as in Equation (|24p for 
S e {{1,2}, {1,2, 3}, {1,2, 3, 4}, {1,2, 3, 4, 5}}. Superscript labeling is as in Table[2l 

In the simulated patterns under CSR independence of classes, class X represents the cases and class Y represents the 
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NNCT-TestS 



NNCT-TestS 




Figure 1: Empirical size estimates for the NNCT-tests based on 10000 Monte Carlo replicates under the 
CSR independence of the two classes, i.e., uniform data from two classes on the unit square. The horizon- 
tal lines are located at .0464 (upper threshold for conservativeness) , .0500 (nominal level), and .0536 (lower 
threshold for liberalness). The numbers in the horizontal axis labels represent sample (i.e., class) size combi- 
nations: 1==(10,10), 2=(10,30), 3=(10,50), 4=(30,10), 5=(30,30), 6=(30,50), 7=(50,10), 8=(50,30), 9=(50,50), 
10=(50,100), 11=(100,50), 12=(100,100). The empirical size labeling is as in Tabic [1 Notice that they are 
arranged in the increasing order for the first and then the second entries. The size values for discrete sample 
size combinations are joined by piecewise straight lines for better visualization. 



controls in the context of Cuzick-Edward's tests. However, the simulated patterns are not realistic for the case/control 
framework, since X and Y points are from a homogeneous Poisson process, while case and control locations usually 
exhibit inhomogeneity in practice. So Cuzick-Edward's tests are not used in the conventional sense (as in case/control 
framework) here, but instead are used to test deviations of the classes from CSR independence. The empirical sizes for 
Cuzick-Edward's fc-NN tests for k < 5 and Tg°"^^ for 4 combinations of the tests are presented in Figure [2] where S^?^ 
is the empirical size for Cuzick-Edward's fc-NN test for k = 1,2, ... ,5, and Sg""'' is the combined test as in l|24p for 
S £ {{1, 2}, {1, 2, 3}, {1, 2, 3, 4}, {1, 2, 3, 4, 5}}. For brevity in notation the index set S is written as 1 - j for j = 2, 3, 4, 5. 
Due to the computational cost in time for Cuzick-Edward's test, we only present 9 sample size combinations compared 
to 12 sample size combinations for the NNCT-tests. 

Observe that Cuzick-Edward's fc-NN tests and Tg°"^^ tests are usually conservative when m < 30 (recall that m 
corresponds to the number of cases in a case/control framework) and about the desired size for other sample size 
combinations. In particular Ti and T2 are more conservative than the other tests. Cuzick-Edward's fc-NN tests for 
fc > 2 are about tire desired size for most of the sample size combinations. The size performance of Tk seems to get better 
as fc increases, since as fc increases the size gets closer and closer to the desired nominal level of .05. However for most 
sample sizes T4 seems to have the best empirical size performance. Then comes T5, and T3, T2, and Ti in decreasing 
order of size performance. On the other hand, the combined tests are conservative when the number of cases is < 30, 
and about the desired size for most of the sample size combinations. In particular is most conservative among the 

combined tests. Tg°"^^ tests usually have better size performance compared to each Tk for k € S. Furthermore, as the 
number of combined tests (i.e., the size of the index set S) increases, the size gets closer to the nominal level, hence the 
T^l'i'' exhibits the best size performance. 

Remark 5.1. Main Result of Monte Carlo Simulations under CSR Independence: Based on the simulation 
results under the CSR independence of the points, we recommend the disuse of Pielou's test in practice, as it is extremely 
liberal, hence might give false alarms when the pattern is actually not significantly difi'erent from CSR independence. 
None of the other NNCT-tests we consider have the desired level when at least one sample size is small so that the 
cell count(s) in the corresponding NNCT have a high probability of being < 5. This usually corresponds to the case 
that at least one sample size is < 10 or the sample sizes (i.e., relative abundances) are very different in the simulation 
study. When sample sizes are small (hence the corresponding cell counts are < 5), the asymptotic approximation of the 
NNCT-tests is not appropriate. However, when sample sizes are very different, cell counts are also more likely to be < 5, 
compared to cell counts for similar sample sizes (roughly, the sample sizes are similar when maxi (n^)/ mini(7ii) < 2.) 
Dixon's test and versions II and HI of the new tests tend to be conservative when the NNCT contains cell(s) whose 
counts are < 5. For larger samples (i.e., the cell counts are larger than 5), NNCT-tests yield empirical sizes that are 
about the desired nominal level. Version I of the new tests and Monte Carlo corrected version of Pielou's test are liberal 
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Cuzick-Edward's k-NN Tests 



Cuzick-Edward's k-NN Tests 



Cuzick-Edward's Combined Tests 




Figure 2: Empirical size estimates for Cuzick-Edward's fc-NN tests and Tg°™'' (i.e., combined) tests based on 
10000 Monte Carlo simulations under the CSR independence pattern, i.e., uniform data in the unit square. 
The horizontal lines are as in Figure [T] The numbers in the horizontal axis labels represent sample (i.e., class) 
size combinations: 1=(10,10), 2=(10,30), 3=(10,50), 4=(30,10), 5=(30,30), 6=(30,50), 7=(50,10), 8=(50,30), 
9=(50,50). The empirical size labeling is as in Tabled 



when wi — n2 < 30, and conservative for ni 7^ n2 < 30, for other sample sizes they are about the desired level. So lPixoiil 
(|1994| ) recommends Monte Carlo randomization for his test when some cell count(s) are < 5 in a NNCT. We extend this 
recommendation for all the NNCT-tests (other than Pielou's test) discussed in this article. On the other hand, for large 
samples, the asymptotic approximation or Monte Carlo randomization can be employed. 

For Cuzick-Edward's tests, we recommend Monte Carlo randomization, when ni < 10; otherwise asymptotic approx- 
imation can also be employed. Observe also that fc-NN tests for > 1 and Tg°"^^ tests attain the normal approximation 
at smaller sample size combinations (i.e., they approach to normality faster) compared to the NNCT-tests. □ 



5.1 Empirical Significance Levels under RL 



Recall that the clustering tests we consider are conditional under the CSR independence pattern. To better assess their 
empirical size performance, we also perform Monte Carlo simulations under various RL patterns where the tests are not 
conditional. We consider the following three cases for the RL pattern. In each RL case, we first determine the locations 
of points for which class labels are to be assigned randomly. Then we apply the RL procedure to these points for various 
sample size combinations. 

RL Case (1): First, we generate n = (ni -I- 712) points iid W((0, 1) x (0,1)) for some combinations of ni,n2 € 
{10,30,50,100}. In each {n\,n2) combination, the locations of these points are taken to be the fixed locations for 
which we assign the class labels randomly. For each sample size combination (711,712), we randomly choose 7ii points 
(without replacement) and label them as X and the remaining n2 points as Y points. We repeat the RL procedure 
Nmc ~ 10000 times for each sample size combination. At each Monte Carlo replication, we compute the NNCT-tests 
and Cuzick-Edwards fc-NN and combined tests. Out of these 10000 samples the number of significant outcomes by each 
test is recorded. The nominal significance level used in all these tests is a = .05. The empirical sizes are calculated as 
the ratio of number of significant results to the number of Monte Carlo replications, Nmc- 

RL Case (2): We generate 711 points iid W((0, 2/3) x (0,2/3)) and 712 points iid W((l/3, 1) x (1/3, 1)) for some combi- 
nations of 711, 712 G {10, 30, 50, 100}. The locations of these points are taken to be the fixed locations for which we assign 
the class labels randomly. The RL procedure is applied to these fixed points Nmc ~ 10000 times for each sample size 
combination and the empirical sizes for the tests are calculated similarly as in RL Case (1). 

RL Case (3): We generate 711 points iid U{{0, 1) x (0, 1)) and 712 points iid W((2,3) x (0, 1)) for some combinations 
of ni,7i2 G {10,30,50,100}. The RL procedure is applied and the empirical sizes for the tests are calculated as in the 
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previous RL Cases. 



The locations for which the RL procedure is apphed in RL Cases (l)-(3) are plotted in Figure [3] for ni = n2 — 100. 
Although there are many possibilities for the allocation of points to which RL procedure can be applied, we only chose 
these three generic cases. In RL Case (1), the allocation of the points are a realization of a homogeneous Poisson process 
in the unit square; in RL Case (2) the points are a realization of two overlapping clusters; in RL Case (3) the points are 
a realization of two disjoint clusters. 

RL Case (1 ) RL Case (2) RL Case (3) 




0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 

X coordinate x coordinate x coordinate 

Figure 3: The fixed locations of points for which RL procedure is applied for RL Cases (l)-(3) with m = n2 = 
100 in the two-class case. Notice that x--axis for RL Case (3) is differently scaled than others. 

We present the empirical significance levels for the NNCT-tests in Figure UJ where the empirical significance level 
labeling is as in Table [2] Observe that, as in the CSR independence case, Pielou's test is extremely liberal for each 
sample size combination under each RL Case. Monte Carlo corrected version of Pielou's test is conservative for most 
small sample size combinations, and usually about the desired size for larger samples. Dixon's test has the desired level 
for moderate to large sample sizes, but conservative for small sample sizes. The new versions seem to have the desired 
level for larger samples, but they fiuctuate between conservativeness and liberalness for smaller samples. Dixon's test and 
version II of the new tests seem to have the best empirical size performance and for smaller samples we again recommend 
the Monte Carlo randomization version of the tests. 

The RL Cases are more appropriate for the case/control framework of Cuzick-Edward's tests compared to the CSR 
independence cases. In the RL cases, the locations of the points could represent the locations of the n = (ni +712) 
subjects so that ni of them are patients (i.e., cases) while the rest are controls. The empirical significance levels for 
Cuzick-Edwards fc-NN and tests are presented in Figure [5] where the empirical significance level labeling is as in 

Figure[2l Observe that, Ti is at about the desired level for similar relative abundances, but is conservative when ni > 712 
and is liberal when ni < n2. For fc > 1, the empirical size estimates of Tfe are about the nominal level. In particular, as 
k increases, the empirical size estimates of Tk get to be closer to the nominal level. Tg°"^^ are conservative when m < 10 
and they are about the desired level otherwise. Furthermore, the combined tests yg"™* have better size performance 
than Tfe. When all the NN tests are considered, Tt for fc > 3 and Tg°"^^ have better size performance than NNCT-tests. 

Remark 5.2. Main Result of Monte Carlo Simulations under RL: Based on the simulation results under RL, we 
reach the same conclusions as in Remark 15.11 for NNCT-tests under RL. That is, we recommend the disuse of Pielou's 
test; and when sample sizes are small (hence the corresponding cell counts are < 5), we recommend using the Monte 
Carlo randomization. 

Among the NNCT-tests, Dixon's test and version II of the new tests have the best size performance under RL. On 
the other hand, among Cuzick-Edward's tests, Ts and TI""* have better size performance under RL, and they have 
about the same size performance as Dixon's test and version II of the new tests. □ 
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RL Case (1) 



RL Case (2) 



RL Case (3) 




RL Case (1) 



RL Case (2) 



RL Case (3) 




Figure 4: The empirical size estimates of the NNCT-tests based on 10000 Monte Carlo replications imder 
RL Cases (l)-(3) for various sample size combinations. The horizontal lines are as in Figure [T] The numbers 
in the horizontal axis labels represent sample (i.e., class) size combinations: 1=(10,10), 2=(10,30), 3=(10,50), 
4=(30,10), 5=(30,30), 6=(30,50), 7=(50,10), 8=(50,30), 9=(50,50), 10=(50,100), 11=(100,50), 12=(100,100). 
The empirical size labeling is as in Table [21 

6 Empirical Power Analysis 

To evaluate the power performance of the clustering tests, we only consider alternatives against the CSR independence 
pattern. That is, the points are generated in such a way that they are from an inhomogeneous Poisson process — 
conditional on the number of points — in a region of interest (unit square in the simulations) for at least one class. We 
avoid the alternatives against the RL pattern; i.e., we do not consider non-random labeling of a hxed set of points that 
would result in segregation or association. 



6.1 Empirical Power Analysis under the Segregation Alternatives 



For the segregation alternatives (against the CSR independence pattern), three cases are considered. We generate 

Xi '~ W((0, 1 — s) X (0, 1 — s)) for i = 1, 2, . . . , ni and Yj *~ W((s, 1) x (s, 1)) for j = 1,2, . . . ,n2. In the pattern generated, 
appropriate choices of s will imply that Xi and Yj are more segregated than expected under CSR independence. That 
is, it will be more likely to have {X,X) NN pairs than mixed NN pairs (i.e., {X,Y) or {Y,X) pairs). The three values 
of s we consider constitute the three segregation alternatives: 



1/6, 



1/4, and H's" 



1/3. 



(28) 



Observe that, from to Hg'^ (i.e., as s increases), the segregation gets stronger in the sense that X and Y points 
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RL Case (1) 



RL Case (2) 



RL Case (3) 




Figure 5: The empirical size estimates of Cuzick-Edwards /c-NN and combined tests based on 10000 Monte 
Carlo replications mider RL Cases (l)-(3) for various sample size combinations. The horizontal lines are as in 
Figure [TJ the horizontal axis labeling is as in Figure [H and the empirical size labeling is as in Table [3l 



18 



tend to form one-class clumps or clusters. By construction, the points are uniformly generated, hence exhibit homogeneity 
with respect to their supports for each class, but with respect to the unit square these alternative patterns are examples 
of departures from first-order homogeneity which implies segregation of the classes X and Y. The simulated segregation 
patterns are symmetric in the sense that, X and Y classes are generated to be equally segregated (or clustered) from 
each other. Hence, although class X stands for the "cases" in Cuzick-Edward's tests, the results would be similar if class 
Y is chosen instead. 



A realization of Hg : s = 1/6 
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Figure 6: Three realizations for Hg : s = 1/6 (left), : s ~ 1/4 (middle), and : s = 1/3 (right) with 

ni = 100 X points (solid squares ■) and n2 = 100 Y points (triangles A). 



Empirical power estimates under 
the segregation alternatives 









Pi 


f3ii 


Phi 


Pp,mc 




(10,10) 


.0775 


.0881 


.0481 


.1026 


.0890 


(10,30) 


.1414 


.1737 


.1192 


.1983 


.1711 


(10,50) 


.2193 


.2187 


.1926 


.2491 


.2246 


(30,30) 


.2904 


.3688 


.2456 


.3837 


.3717 


H¥ 


(10,10) 


.2305 


.2830 


.1523 


.3203 


.2842 


(10,30) 


.4555 


.5344 


.4063 


.5734 


.5349 


(10,50) 


.6174 


.6413 


.5796 


.6794 


.6491 


(30,30) 


.8141 


.8847 


.7728 


.8917 


.8858 


H's" 


(10,10) 


.5817 


.6875 


.4697 


.7257 


.6890 


(10,30) 


.8787 


.9248 


.8467 


.9406 


.9245 


(10,50) 


.9528 


.9617 


.9395 


.9711 


.9627 


(30,30) 


.9969 


.9988 


.9947 


.9990 


.9989 



Table 4: The empirical power estimates for the tests under the segregation alternatives, Hg — Hg^^ with 
Nmc = 10000, for some combinations of ni,n2 G {10,30,50} at a = .05. /?d stands for Dixon's test, /?/, f3ji, 
and /?/// for versions I, 11, and III of the new tests, respectively, and /3p,,„c for Monte Carlo corrected version 
of Pielou's test. 

The empirical power estimates for NNCT-tests for (ni,n2) G {(10, 10), (10, 30), (10, 50), (30, 30)} are provided in 
Table ID The power estimates against the sample size combinations for all the tests considered are presented in Figure 
[71 where Pd is for Dixon's test, /3ii, and Piii are for versions I, II, and III of the new tests, respectively, and f5p,m.c 
is for Monte Carlo corrected version of Pielou's test, jd'^^ is for Cuzick-Edwards fc-NN test for = 1, 2, . . . , 5, and Pl"!^^ 
is for Cuzick-Edwards Ti°J^^ for j = 1,2,3,4 (the empirical power estimate for Pielou's test is not presented as it is 
misleading, see Remarks 15.1! and 15. 2p . Observe that, as n = (ni -|- 712) gets larger, the power estimates get larger. For 
the same n = (ni -f 722) values, the power estimate is larger for classes with similar sample sizes. Furthermore, as the 
segregation gets stronger, the power estimates get larger. The new version II test Xfj has the lowest power estimates 
for each sample size combination. On the other hand the new versions, Xf , Xjjj, and Xp „^^ have about the same power 
estimates that are larger than those of Dixon's test with version III of the new tests having the highest power estimate at 
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Empirical power estimates under 
the association alternatives 
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H'l 


(10,10) 


.1834 


.4362 


.2596 


.2871 


.4422 


(10,30) 


.4956 


.6155 


.5782 


.4964 


.4847 


(10,50) 


.5500 


.3645 


.5820 


.2352 


.0112 


(30,30) 


.4141 


.6037 


.4663 


.5243 


.6070 


HT 


(10,10) 


.2222 


.5068 


.2988 


.3448 


.5138 


(10,30) 


.6003 


.7232 


.6811 


.6148 


.6037 


(10,50) 


.6512 


.4753 


.6776 


.3267 


.0203 


(30,30) 


.6157 


.7912 


.6667 


.7283 


.7962 



Table 5: The empirical power estimates for the tests under the association alternatives — Hj^^ with 
Nmc = 10000, for some combinations of ni,n2 G {10,30,50} at a = .05. The empirical power labehng is as in 
Table m 

each sample size combination. Considering the empirical significance levels and power estimates, we recommend version 
III of the new tests for testing against this type of segregation, as Xjjj is at the correct significance level for similar 
sample sizes, mildly conservative for very different sample sizes. Additionally, Xju has the highest power for all sample 
size combinations. 

Among Cuzick-Edwards fc-NN tests, Tk with k > 1 have about the same power which is larger than that of Ti for 
large samples. In particular, Ti and Ts have the highest power estimates which are virtually indistinguishable. The 
power estimates for Tk tests seem to be higher than those of NNCT-tests presented here. As for the T^°"^^ tests, their 
power estimates are higher than the individual Tk tests, and the power estimate increases as j increases in Tf"'"**, i.e., 
the more successive tests are combined from 1, 2, . . . , fc, the higher the power estimates for Tg°"^^ . 

Considering the power estimates and empirical size performances, T-f or T^'^^ have the best performance, hence 
either can be recommended against the segregation alternatives. However given the computational cost of Tg°"^^ tests 
for larger k values, we recommend Tk with fc = 4 or 5 for the segregation alternatives. 



6.2 Empirical Power Analysis under the Association Alternatives 

For the association alternatives (against the CSR independence pattern), we also consider three cases. First, we generate 
Xi '~ W((0, 1) X (0, 1)) for i = 1, 2, . . . , ni . Then we generate Yj for j = 1, 2, . . . , n2 as follows. For each j, we pick an 
i randomly, then generate Yj as Xi + Rj (cos T, , sin Tj)' where Rj '~ Vt{0,r) with r £ (0, 1) and Tj '~ U{0,2tt). In the 
pattern generated, appropriate choices of r will imply Yj and Xi are more associated than expected. That is, it will 
be more likely to have {X,Y) NN pairs than self NN pairs (i.e., {X,X) or {Y,Y)). The three values of r we consider 
constitute the three association alternatives: 

//i:r- = l/4, //" : r = 1/7, and : r = 1/10. (29) 

Observe that, from to (i.e., as r decreases), the association gets stronger in the sense that X and Y points 

tend to occur together more and more frequently. By construction, X points are from a homogeneous Poisson process 
with respect to the unit square, while Y points exhibit inhomogeneity in the same region. Furthermore, these alternative 
patterns are examples of departures from second-order homogeneity which implies association of the class Y with class 
X. The simulated association patterns are contrary to the case/control framework of Cuzick-Edward's tests, since class 
X is used for the case class and class Y points are clustered around X points. However, we still include Cuzick-Edward's 
tests to evaluate their performance under this type of deviation from the CSR independence pattern. 
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Empirical power estimates for the NNCT-tcsts under Hs 
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Empirical power estimates for Cuzick-Edward's fc-NN tests under Hs 
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Empirical power estimates for Cuzick-Edward's combined tests under 
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Figure 7: Empirical power estimates for the NNCT-tests, Cuzick-Edward's fc-NN tests for fc = 1,2, ... ,5, and 
rpcomb |;gg|;g fQj. J- _ 1^2,3,4 based on 10000 Monte Carlo replications under the segregation alternatives. The 
numbers in the horizontal axis labels represent sample (i.e., class) size combinations: 1=(10,10), 2=(10,30), 
3=(10,50), 4=(30,30), 5=(30,50), 6=(50,50). (3d, Pi, (3ii, Pm, and 3p,mc are as in Table H stands for 

Cuzick-Edwards fc-NN test for fc = 1, 2, . . . , 5, and ^Sf""^ for Cuzick-Edwards Tf""*'' for j = 1, 2, 3, 4. 
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A realization of H); : r = 1/7 
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A realization of H);' : r = 1/10 
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Figure 8: Three realizations for H{ : r = 1/4 (left), H'J : s = 1/7 (middle), and H'J' : r = 1/10 (right) with 
m = 20 X points (sohd squares ■) and n2 = 100 Y points (triangles A). 



The empirical power estimates for NNCT-tests for (ni,n2) £ {(10, 10), (10, 30), (10, 50), (30, 30)} are provided in 
Table (5] The power estimates under the association alternatives are presented in Figure [O] where labeling is as in 
Figure [7] Observe that, for similar sample sizes as n = (ni +712) gets larger, the power estimates get larger at each 
association alternative. Furthermore, as the association gets stronger, the power estimates get larger at each sample 
size combination. Considering the NNCT-tests, for most sample size combinations, version I of the new tests has the 
highest power estimate (except for (ni, 712) = (10, 50) in which case, version II of the new tests has the highest power and 
most tests perform very poorly, with Monte Carlo corrected version of Pielou's test being the worst). Hence considering 
the empirical size and power estimates, we recommend version I of the new tests for large samples, and Monte Carlo 
randomization for the NNCT-tests for small samples. 

Considering Cuzick- Edwards A;-NN tests, it is seen that Ti has virtually no power for (711,712) = (10,30) or (10,50) 
and T2 has virtually no power for (711,712) — (10,50). Under H^, T\ has the highest power estimate, while under other 
association alternatives, T2 has higher power estimates for larger sample sizes. For smaller samples either or Monte 
Carlo randomization of the tests can be used. Considering Cuzick-Edwards fc-NN tests with NNCT-tests together, 
observe that version III of the new overall tests has the best power performance under the association alternatives. 

Among Cuzick-Edwards combined tests, T^°V^^ and T^°y2^ have virtually no power for (jii, 712) = (10, 50). For almost 
all sample size combinations, T^'^^ has the highest power estimates. Considering all tests together, still Til^^ has 
the best power performances under the association alternatives, hence can be recommended for use against this type 
of association. However, given the computational cost of combined tests, one might prefer version HI of the new tests 
under the association alternatives for larger samples, as its power is very close to that of Til^''. For smaller samples, 
either Monte Carlo randomization for NNCT-tests, or asymptotic approximation for Til^'' or rj^f™* can be used. 

Remark 6.1. Edge Correction for NNCT- Tests: Edge (or boundary) effects are not a concern for testing against 
the RL pattern. However, the CSR independence pattern assumes that the study region is unbounded for the analyzed 
pattern, which is not the case in practice. So edge effects are a constant problem in the analysis of empirical (bounded) 
data sets if the null pattern is the CSR inde pendence and m uch effort has gone into the development of edge corrections 
methods (|Yamada and RogersenI (|2003l ) and lDbconI (|2002bl )'). 

Two correction methods to mitigat e the edge e ffects on NNCT-tests, namely, buffer zone correction and toroidal 
correction, are investigated in (jCevhanl (|2006l . [20071 )) where it is shown that the empirical sizes of the NNCT-tests are 
not affected by the toroidal edge correction under CSR independence. On the other hand, the toroidal correction has 
a mild influence on the results provided that there are no clusters around the edges. Furthermore, toroidal correction 
(slightly) improves the results of some of the segregation tests based on NNCTs. However, toroidal correction is biased for 
non-CSR patterns. In particu lar if the patt ern o utside the plot (which i s often unknown) is not the same as that inside it 
it yields questionable results (|Haasel (|l995l ) and lYamada and RogersenI (|2003l )). The bias is more severe especially when 
the edges cut through some cluster(s). The (outer) buffer zone edge correction method seems to have slightly stronger 
influence on the tests compared to toroidal correction. But for these tests, buffer zone correct ion does not change the 
sizes significantly for most sample size combinations. This is in agreement with the findings of iBarot" et all (|l999l ) who 
say NN methods only require a small buffer area around the study region. A large buffer area does not help much since 
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Empirical power estimates for the NNCT-tests under Ha 
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Empirical power estimates for Cuzick-Edward's /c-NN tests under Ha 
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Empirical power estimates for Cuzick-Edward's combined tests under H/ 
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Figure 9: Empirical power estimates for the NNCT-tests, Cuzick-Edward's fc-NN tests for fc = 1, 2, . . . , 5, and 
rpcomb ^gg^g fQj. j — 2, 3, 4 under the association alternatives. The numbers in the horizontal axis labels repre- 
sent sample (i.e., class) size combinations: 1=(10,10), 2=(10,30), 3=(10,50), 4=(30,30), 5=(30,50), 6=(50,50). 
The empirical power labeling is as in Figure [T] 



23 



one only needs to be able to see far enough away from an event to find its NN. Once the buffer area extends past the 
likely NN distances (i.e., about the average NN distances), it is not adding much helpful information for NNCTs. Hence 
we recommend inner or outer buffer zone correction for NNCT-tests with the width of the buffer area being about the 
average NN distance. We do not recommend larger buffer areas, since they are wasteful with little additional gain. □ 



7 Examples 



We illu s trate the tests on three exam ples: two ecological data s ets, namely Pielou's Douglas-fir/ponderosa pine data 
(|Pieloul (Il96j)) and sw amp tree data (|Good and Whipple! (USS^)), and an epidemiological data set, namely leukemia 
data set (|Digglel (|2003l )). 



7.1 Pielou's Data 



Pielou used a completely mapped data set that is comprised of ponderosa pine (Pin us Ponderosa) and Douglas-fir trees 
{Pseudotsuga menziesii formerly P. taxif olia) fr o m a region in British Columbia jPieloul (|l96 j )). Her data are also 
used by Dixon as an illustrative example (|Dixonl (|l994 )). Since the data consist of individuals of different species, it is 
more reasonable to assume CSR independence as the underlying pattern for the null hypothesis of randomness in NN 
structure. Deviation from CSR independence implies that the two classes are a priori the result of different processes. 
The question of interest is whether the two tree species are segregated, associated, or do not significantly deviate from 
CSR independence. The corresponding NNCT and the percentages are provided in Table |6l The percentages for the cells 
are based on the size of each tree species. For example, 86 % of Douglas-firs have NNs from Douglas firs, and remaining 
15 % have NNs are from ponderosa pines. The row and column percentages are marginal percentages with respect to 
the total sample size. The percentage values in the diagonal cells are suggestive of segregation for both species. 

NN 





D.F. 


P.P. 


sum 


base 


137 (86 %) 
38 (56 %) 


23 (15 %) 
30 (44 %) 


160 (70 %) 
68 (30 %) 


sum 


175 (77 %) 


53 (23 %) 


228 (100 %) 



Tabic 6: The NNCT for Pielou's data and the corresponding percentages (in parentheses). D.F.= Douglas-fir, 
P.P.= ponderosa pine. 

The raw data are not available, but fortunately, IPieloul l|l96ll ) provided Q = 162 and R = 134. Hence, we could 
calculate the NNCT-test statistics which are provided in Table [7] where Cd stands for Dixon's test of segregation, Xp 
for Pielou's test, Xp,-^^ for Pielou's test by Monte Carlo simulations, Xf for version I as in Equation (|12|) . Xfj as in 
Equation (|18p . and Xfjj as in Equation (|2f p . The p- values are also provided below the test statistics in parentheses. 
Observe that all of the tests are significant, implying significant deviation from independence in NN structure (hence 
CSR independence), and the percentages in the NNCT imply that there is significant segregation for both species. On 
the other hand, since the raw data is not available, neither Cuzick-Edward's fc-NN and combined tests nor Ripley's K 
or L-functions and pair correlation functions can be calculated. 



7.2 Swamp Tree Data 

iGood and Whipple! (|l98j}_ con s idered the spatial patterns of tree species along the Savannah River, South Carolina, 
U.S.A. From this data, loixonl (|2002ah used a single 50m x 200m rectangular plot to illustrate his tests. All live or 
dead trees with 4.5 cm or more dbh (diameter at breast height) were recorded together with their species. Hence it is 
an example of a realization of a marked multi-variate point pattern. The plot contains 13 different tree species, four of 
which comprise over 90 % of the 734 tree stems. The remaining tree stems were categorized as "other trees" . The plot 
consists of 215 water tupelo {Nyssa aquatica), 205 black gum {Nyssa sylvatica), 156 Carolina ash (Fraxinus caroliniana) , 
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NNCT-test statistics and the associated p- values 


Data 


?(!p 


Cd 










Piclou's 


23.66 


19.67 


12.73 


19.29 


13.09 


14.41 


Data 


(< .0001) 


(.0001) 


(.0004) 


(.0001) 


(.0003) 


(.0001) 


Swamp Tree 


212.20 


133.48 


132.13 


132.42 


133.20 


129.16 


Data 


(< .0001) 


(< .0001) 


(< .0001) 


(< .0001) 


(< .0001) 


(< .0001) 


Leukemia 


3.31 


2.25 


1.98 


2.10 


2.13 


2.02 


Data 


(.0687) 


(.3249) 


(.1599) 


(.3505) 


(.1449) 


(.1547) 



Tabic 7: Test statistics and the associated p-values (in parentheses) for NNCT-tests for the example data 
sets. Cd stands for Dixon's test of segregation, Xp for Pielou's test, Xp^-^^ for Pielou's test by Monte Carlo 
simulations, X] for version I as in Equation (fT2|l . Xf^ for version II as in Equation (fT8|l . and X]jj for version 
III as in Equation ((2T|) . 

98 bald cypress {Taxodium distichum), and 60 stems of 8 additional species (i.e., other species). We will only consider 
the three most frequent tree species in this data set (i.e., water tupelos, black gums, and Carolina ashes). So a 3 x 3 
NNCT-analysis is conducted for this data set. If segregation among the less frequent species were important, a more 
detailed 5 x 5 or a 12 x 12 NNCT-analysis should be performed. The locations of these trees in the study region are 
plotted in Figure [10] and the corresponding 3x3 NNCT together with percentages based on row and grand sums are 
provided in Table [8] For example, for black gum as the base species and Carolina ash as the NN species, the cell count 
is 31 which is 15 % of the 205 black gums (which is 36 % of all trees). Observe that the percentages and Figure [TOl are 
suggestive of segregation for all three tree species since the observed percentage of species with themselves as the NN is 
much larger than the row percentages. 



Swamp Tree Data 




1 I I \ T" 

50 100 150 200 



X coordinate (m) 

Figure 10: The scatter plot of the locations of water tupelos (circles o), black gum trees (triangles A), and 
Carolina ashes (pluses +). 

The locations of the tree species can be viewed a priori resulting from different processes so the more appropriate 
null hypothesis is the CSR independence pattern. Hence our inference will be a conditional one (see Remark 13. ip . We 
calculate Q = 472 and R = 454 for this data set. We present the tests statistics and the associated p-values for NNCT- 
tests in Table [7] Based on the NNCT-tests, we find that the segregation between all species are significant, since all the 
tests considered yield significant p- values and the diagonal cells are larger than expected. 
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NN 





W.T. 


B.G. 


C.A. 


sum 


W.T. 
base B.G. 
C.A. 


134 (62 %) 
47 (23 %) 
34 (22 %) 


47 (22 %) 
128 (62 %) 
27 (17 %) 


34 (16 %) 
31 (15 %) 
96 (61 %) 


215 (37 %) 
206 (36 %) 
157 (27 %) 


sum 


215 (37 %) 


202 (35 %) 


162 (28 %) 


578 (100 %) 



Tabic 8: The NNCT for swamp tree data and the corresponding percentages (in parentheses), where the cell 
percentages are with respect to the row sums and marginal percentages arc with respect to the total size. W.T. 
= water tupelos, B.G. = black gums, and C.A. = Carolina ashes. 



Test statistics and the associated p- values for Swamp Tree Data 


Guzick-Edward's fc-NN tests 


Data 


Ti 


T2 


T3 


T4 


T5 


W.T. vs B.G. 


155 (< .0001) 


309 (< .0001) 


451 (< .0001) 


588 (< .0001) 


703 (< .0001) 


B.G. vs W.T. 


149 (< .0001) 


279 (< .0001) 


411 (< .0001) 


529 (< .0001) 


650 (< .0001) 


W.T. vs C.A. 


171 (< .0001) 


337 (< .0001) 


498 (< .0001) 


653 (< .0001) 


812 (< .0001) 


C.A. vs W.T. 


108 (< .0001) 


213 (< .0001) 


297 (< .0001) 


378 (< .0001) 


455 (< .0001) 


B.G. vs C.A. 


159 (< .0001) 


303 (< .0001) 


461 (< .0001) 


606 (< .0001) 


755 (< .0001) 


C.A. vs B.G. 


115 (< .0001) 


216 (< .0001) 


315 (< .0001) 


410 (< .0001) 


511 (< .0001) 



Guzick-Edward's combined tests 


Data 


rpconib 
^1-2 


rpcomb 


rpcovab 
^1-4 


rpcomb 
^1-5 


W.T. vs B.G. 


7.31 (< .0001) 


8.15 (< .0001) 


8.78 (< .0001) 


9.06 (< .0001) 


B.G. vs W.T. 


7.27 (< .0001) 


7.95 (< .0001) 


8.36 (< .0001) 


8.70 (< .0001) 


W.T. vs C.A. 


7.64 (< .0001) 


8.71 (< .0001) 


9.46 (< .0001) 


10.14 (< .0001) 


C.A. vs W.T. 


7.29 (< .0001) 


7.93 (< .0001) 


8.27 (< .0001) 


8.56 (< .0001) 


B.G. vs C.A. 


6.80 (< .0001) 


7.83 (< .0001) 


8.55 (< .0001) 


9.23 (< .0001) 


C.A. vs B.G. 


7.96 (< .0001) 


8.83 (< .0001) 


9.41 (< .0001) 


9.57 (< .0001) 



Table 9: Test statistics and the associated p-values (in parentheses) for Guzick-Edward's NN tests the swamp 
tree data. Tk stands for Guzick-Edward's fc-NN test for fc - 1 ^ ^ "--^ 
for J -2, 3, 4, 5. 



1, 2, 3, 4, 5 and Tf"™" stands for the combined tests 



The swamp tree data have the null hypothesis as the CSR independence of three tree species, hence do not fall in 
the generalized two-class case/control framework of Cuzick-Edward's tests. So we apply these tests on the swamp tree 
data for two species at a time. As Cuzick-Edward's tests are more sensitive to detect the clustering of the cases (i.e., the 
first class in the generalized framework), they are not symmetric in the two species they are used for. Hence, we apply 
these tests for each of the six different ordered pairs of tree species and the resulting test statistics and the associated 
p- values are presented in Tabled Based on Cuzick-Edwards fc-NN tests (i.e., Tk), water tupelos and black gums exhibit 
significant segregation since all Tt values are significant. Likewise for water tupelos versus Carolina ashes and bald 
cypresses versus Carolina ashes. 

Based on the NNCT-tests and Cuzick-Edwards fc-NN tests above, we conclude that tree species exhibit significant 
deviation from the CSR independence pattern. Considering Figure [TO] and the corresponding NNCT in Table [S] this 
deviation is toward the segregation of the tree species. However, the results of NNCT-tests pertain to small scale 
interaction at about the average NN distances; and the results of Cuzick-Edward's fc-NN tests pertain to interaction at 
about the average fc-NN distances. In Figure [TTJ we present the plots of Lii{t) — t functions for each species as well as 
the plot of the entire data combined. We also present the upper and lower 95 % confidence bounds for each Lii{t) — t. 
Observe that (the Loo(i) — t curve is above the upper confidence bound) at all scales. Water tupelos exhibit aggregation 
for the range of the plotted distances; black gums exhibit significant aggregation for distances t > 1 m; and Carolina 
ashes exhibit significant aggregation for the range of plotted distances. Hence, segregation of the species might be due 
to different levels and types of aggregation of the species in the study region. 
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All Trees 




10 20 30 40 50 



t(m) 



Water Tupelos Black Gums Caroline Ashes 




10 20 30 40 50 10 20 30 40 50 10 20 30 40 50 



t(m) t(m) t(m) 

Figure 11: Second-order properties of swamp tree data. Functions plotted are Ripley's univariate L- functions 
Lii{t) — t for i — 1,2, 3, where i = 1 for water tupelos. i = 2 for black gums, and i = 3 for Carolina ashes. The 
dashed lines around are the upper and lower 95 % confidence bounds for the L-functions based on Monte 
Carlo simulation under the CSR independence pattern. Note also that vertical axes are not identically scaled 
for all plots. 

We also calculate Ripley's bivariate L-function for each pair of tree species and present them in Figure [T^ we present 
the bivariate plots of Lij{t) — t functions together with the upper and lower 95 % confidence bounds for each pair of 
species. Due to the symmetry of Lij{t), we only present the plots for 3 different pairs. Observe that for distances up to 
t ~ 10 m, water tupelos and black gums exhibit significant segregation (L12 (t) — t is below the lower confidence bound) , 
for 35 < t < 40 they exhibit significant association, and for the rest of the plotted distances their interaction is not 
significantly difi'erent from the CSR independence pattern; water tupelos and Carolina ashes are significantly segregated 
up to about t ~ 10 m, and for t > 15 m they are significantly associated. Black gums and Carolina ashes are significantly 
segregated for t > 2 m. 

Since Ripley's Tv-function is cumulative, we also provide the pair correlation functions for all trees and each species 
for the swamp tree data in Figure [13] Observe that all trees are aggregated around distance values of 0-1,3,4-7,8-10 m; 
water tupelos are aggregated for distance values of 0-7 m; black gums are aggregated for distance values of 1-6 and 8-11 
m; Carolina ashes are aggregated for all the range of the plotted distances. Comparing Figures [TTI and [T51 we see that 
Ripley's L and pair correlation functions detect the same patterns but with different distance values. That is, Ripley's 
L implies that the particular pattern is significant for a wider range of distance values co mpared to gjt), since Ripley 's 
L is cumulative, so the values of L at small scales confound the values of L at larger scales (|Loosmore and FordI (|2006l ')). 
Hence the results based on pair correlation function g{t) are more reliable. 

The bivariate pair correlation functions for the species in swamp tree data are plotted in Figure 1141 Observe that 
water tupelos and black gums are segregated for distance values of 0-1 m; water tupelos and Carolina ashes are segregated 
for values of 0-1 and 2.5 m and are associated for values about 6 and 11 m; black gums and Carolina ashes are segregated 
for 2-5, 6-8.5, and 9.5-12 meters. 
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W.T. vs B.G. W.T. vs C.A. B.G. vs C.A. 




10 20 30 40 50 10 20 30 40 50 10 20 30 40 50 



t(m) t(m) t(m) 

Figure 12: Second-order properties of swamp tree data. Functions plotted are Ripley's bivariate L- functions 
Lij{t) — t for = 1,2,3 and i ^ j where i = 1 for water tupelos (W.T.), i = 2 for black gums (B.G.), and 
i = 3 for Carolina ashes (C.A.). The dashed lines around are the upper and lower 95 % confidence bounds 
for the L-functions based on Monte Carlo simulations under the CSR independence pattern. Note also that 
vertical axes are differently scaled. 

Since the estimator variance and hence the bias are considerably large for small t if g{t) > 0, the confidence bands 
for smaller t values are much wider compared to those for larger t values (see for example Figures [13] and I14|l . So pair 
correlation function analysis is more reliable for larger distances and it is safer to use g{t) for distances larger than the 
average NN distance in the data set. Comparing Figure [TT] with Figure [131 and Figure [12] with Figure [141 we see that 
Ripley's L and pair correlation functions usually detect the same large-scale pattern but at different ranges of distance 
values. Ripley's L suggests that the particular pattern is significant for a wider range of distance values compared to 
g{t), but at larger scales g(t) is more reliable to use. 

While second order analysis (using Ripley's K and L-functions or pair correlation function) provides information on 
the univariate and bivariate patterns at all scales (i.e., for all distances), NNCT-tests summarize the spatial interaction 
for the smaller scales (for distances about the average NN distance in the data set). In particular, for the swamp tree 
data average NN distance (± standard deviation) is about 1.93 (± 1.17) meters and notice that Ripley's L-function 
and NNCT-tests yield similar results for distances about 2 meters. Further, the average fc-NN distances ± standard 
deviations for A: = 2, 3, 4, 5 are 2.94 ± 1.36, 3.81 ± 1.41, 4.47 ± 1.45, and 5.10 ± 1.49, respectively. 



7.3 Leukemia Data 

[Cuzick and Edwards! (|l990l ) considered the spatial locations of 62 cases of childhood leukemia in the North Humberside 
region of the UK, between the years 1974 to 1982 (inclusive). A sample of 143 controls are selected using the completely 
randomized design from the same region. We analyze the spatial distribution of leukemia cases and controls in this data 
using a 2 X 2 NNCT. We plot the locations of these points in the study region in Figure[T5]and provide the corresponding 
2x2 NNCT together with percentages based on row and column sums in Table [TOl Observe that the percentages in the 
diagonal cells are about the same as the marginal (row or column) percentages of the subjects in the study, which might 
be interpreted as the lack of any deviation from RL for both classes. Figure [TJ] is also supportive of this observation. 

NN 







case 


control 


sum 


, case 
base 

control 


25 


(38 


%) 


41 (62 %) 


66 


(30 %) 


39 


(26 


%) 


113 (74 %) 


152 


(70 %) 


sum 


64 


(29 


%) 


154 (71 %) 


218 


(100 %) 



Tabic 10: The NNCT for the North Humberside leukemia data and the corresponding percentages (in paren- 
theses). 
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All Trees 




Figure 13: Pair correlation functions for all trees combined and for each species in the swamp tree data. Wide 
dashed lines around 1 (which is the theoretical value) are the upper and lower (pointwise) 95 % confidence 
bounds for the L-functions based on Monte Carlo simulation under the CSR independence pattern. Note also 
that vertical axes are differently scaled. 



Cuzick-Edward's test statistics and the associated p- values for Leukemia Data 


Ti 










rpcomb 
^1-2 


rpcomb 
^1-3 


rpcoinb 


rpconib 
^1-5 


25 
(.0647) 


53 
(.0043) 


78 
(.0014) 


95 
(.0093) 


116 
(.0099) 


2.12 

(.0170) 


2.46 
(.0068) 


2.53 
(.0057) 


2.58 
(.0048) 



Table 11: Test statistics and the associated p-valucs for Cuzick-Edward's fc-NN (i.e., Tk) and Tg°™^ tests for 
North Humberside leukemia data. 



It is reasonable to assume that some process affects a posteriori the population of North Humberside region so that 
some of the individuals get to be cases, while others continue to be healthy (i.e., they are controls). So the appropriate 
null hypothesis is the RL pattern. We calculate Q = 152 and R = 142 for this data set. In Tables [71 and 1111 we present 
the test statistics and the associated p-values. Observe that none of the NNCT-tests yields a significant result. On the 
other hand, Cuzick-Edwards Tk are all significant for fc > 1, and so are all T§°"''' tests. Hence, we conclude that there is 
no significant segregation of cases at small scales (about NN-distances), but cases tend to cluster significantly at larger 
scales. 

Based on the NNCT-tests above, we conclude that the cases and controls do not exhibit significant clustering (i.e., 
segregation) at small scales. Based on Cuzick-Edward's tests, we find that the cases are significantly segregated around 
fe-NN distances for fc > 1. However, NNCT-methods only provide information on spatial interaction for distances about 
expected NN distance in the data set, and Cuzick-Edward's tests provide information about the fc-NN distances. So it 
might be the case that the type and level of interaction might still be different at larger or other scales (i.e., distances 
between the subjects' locations). However, the locations of the subjects in this population (cases and controls together) 
seem to be from an inhomogeneous Poisson process (see also Figure [T^. Hence Ripley' s K- or L-functions in the general 
form are not appropriate to test for the spatial clustering of the cases (j Kulldorfj (|2006l )). So we use the modified version 
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W.T. vs B.G. 



W.T. vs C.A. 



B.G. vs C.A. 




Figure 14: Pair correlation functions for each pair of species in the swamp tree data. Wide dashed lines around 
1 (which is the theoretical value) are the upper and lower (pointwise) 95 % confidence bounds for the L-functions 
based on Monte Carlo simulations under the CSR independence pattern. W.T. = water tupelos, B.G. = black 
gums and C.A. = Carolina ashes. Note also that vertical axes are not identically scaled for all plots. 

due to lDiggl3 (|2003h . namely, D{t) — Kii{t) — K22(t) where Kii{t) is Ripley's univariate /f-function for class i. In this 
setup, "no spatial clustering" is equivalent to RL of cases and controls on the locations in the sample, which implies 
D{t) — 0, since K22{t) measures the degree of spatial aggregation of the controls (i.e., the population at risk), while 
Kii{t) measures this same spatial aggregation plus any additional clustering due to the disease. The test statistic D{t) 
is estimated by D{t) — Kii{t) — K22{t), where Knit) is as in Equation (|26|) . Figure [161 shows the plot of D{t) plus 
and minus two standard errors under RL. Observe that at distances about 200 and 600 meters, there is evidence for 
mild clustering of diseases (i.e., segregation of cases from controls) since the empirical function D{t) gets close or a little 
above of the upper limit. At smaller scales, plot in Figure [16] is consistent with the results of the NNCT analysis. In 
particular average NN distance for leukemia data is 700 (± 1400) m, and NNCT analysis summarizes the pattern for 
about t = 1000 m which is depicted in Figure 1161 (Further, the average fc-NN distances ± standard deviations for 
fc = 2, 3, 4, 5 are 1342 ± 20 51, 1688 ± 2594, 2152 ± 3188, and 2495 ± 3810, respectively). This same data set was also 
analyzed by (jPiggld ()2003l ) pp 131-132) and similar plots and results were obtained. 



8 Discussion and Conclusions 



In this article, we discuss segregation or clustering tests based o n neare s t neig hbor conting e ncy tables (N NCTs) . Pielou's 
and Dixon's segregation tests are already in use in literature jPieloul (|l96lh and jPixonl (|l994l . [2002al lbl). Pielou's test 
of independence is only appropriate when the null hypothesis implies that the NNCT is based on a random sample of 
(base,NN) pairs, but not appropriate when the null case implies the NNCTs are b ased on data from complete spatial 
randomness (CSR) independence or random labeling (RL) patterns (|Cevhanl (|2006l )). Dixon's tests are appropriate for 
the null patterns of CSR independence or RL (but they are conditional under the CSR independence pattern) , which are 
more realistic in practical situations. In literature, both of Pielou's and Dixon's tests are used for the null hypotheses 
of CSR independence or RL. We propose three new tests using the correct (asymptotic) distribution of cell counts in 
NNCTs and a corrected version for Pielou's test based on empirical estimates of its mean and variance. We also compare 
the NNCT-tests with Cuzick-Edward's fc-NN and combined tests in an extensive Monte Carlo simulation study and with 
Ripley's K or L-functions, Diggle's _D-function and pair correlation functions in example data sets. 



For testing segregation or association against the CSR independence or RL patterns, we recommend the disuse of 
Pielou's test, as it gives more false alarms than allowed by the significance level. As a quick fix, one can use the Monte 
Carlo corrected version for rectangular study regions for similar sample sizes. Alternatively, one can also resort to Monte 
Carlo randomization for Pielou's test. Considering the empirical significance levels, empirical power estimates, and 
distributional properties, we recommend version III of the new NNCT-tests when testing for segregation or association. 
Among Cuzick-Edward's tests combined version of T^, for A: = 1, 2, . . . , 5 has slightly better performance than NNC T-tests 
for th e association alternatives, but the gain does not compensate the computational cost of this test. Figure 4 in (|Dixonl 
(|l994h p 1946) shows that the acceptance regions for Pielou's and Dixon's tests have different shapes, so these tests are 
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Leukemia Data 
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Figure 15: The scatter plots of the locations of cases (circles o) and controls (triangles A) in North Humberside 
leukemia data set. 

answering different questions. But, the newly proposed NNCT-tests and Dixon's test address the same question about 
the spatial interaction between the classes. On the other hand, Cuzick-Edward's test is designed for the clustering of 
the cases (or the first class in the general framework), so NNCT-tests and Cuzick-Edward's tests also answer similar but 
not identical questions. 

When testing against CSR independence or RL, NNCT-tests provide information about the spatial interaction at 
about the average NN distance in the data sets. Cuzick-Edward's tests provide information about the fc-NN distance. 
Ripley's K or L-function provides the type and level of spatial interaction at all scales (i.e., at all distances one is 
interested) when used against the CSR independence pattern. However, due to the cumulative nature of these functions, 
Stoyan's pair correlation function is preferable for large distances. Furthermore, Biggie's D-function provides the level 
of spatial interaction (or clustering of a class when compared to another) at all scales when used against the RL pattern. 
Among these tests, Cuzick-Edward's test is designed only for the two class case of cases and controls. The NNCT-tests 
can be used for the multivariate spatial interaction between two or more classes. Ripley's L has univariate and bivariate 
versions while Biggie's D is designed for bivariate pattern analysis. A practical concern about these tests is the lack of 
code in some statistical software in a way that others could use. The methods outlined here have been implemented in 
R version 2.6.2, and the relevant code is available from the author upon request. 

In this article, we have only considered spatial patterns of two classes i n a st udy region. Bixon has extended his 
tests into multi-class situation with three or more classes (species) (|Bixonl (|2002al )). On the other hand, Pielou's test 
is defined and has only been used for the two-class spatial patterns. Its inappropriateness discourages the immediate 
extension to multi-class patterns. However, the newly introduced versions can easily be extended to the multi-class case. 
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Second-Order Analysis of 
Leukemia Clustering 
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Figure 16: Second-order analysis of North Humberside childhood leukemia data: Function plotted is Biggie's 
modified bivariate J^-function D{t) = Kii{t) — K22{t) with i = 1 for controls and i = 2 for leukemia cases. The 
dashed lines around are plus and minus two standard errors of D(t) under RL of cases and controls. 
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Empirical Means and Variances 






of the Test Statistics 




sizes 


Means 


Variances 


{ni,n2) 




M[Cd] 






(10,10) 


1.793 


2.021 


5.698 


3.420 


(10,30) 


1.647 


2.020 


4.233 


3.660 


(10,50) 


1.575 


1.997 


4.481 


3.857 


(30,30) 


1.654 


2.007 


5.201 


3.774 


(30,50) 


1.653 


2.009 


5.237 


3.787 


(50,50) 


1.647 


2.016 


5.328 


3.905 


(100,100) 


1.646 


2.010 


5.304 


3.837 


(200,200) 


1.628 


2.005 


5.409 


4.053 



Table 12: The empirical means and variances for Pielou's and Dixon's segregation tests. 
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Appendix: Details of Empirical Correction of Pielou's Test of Segre- 
gation 



In this section, we provide the details concerning the estimation of the mean and variance of Pielou's test of segregation. 
First, we plot the kernel density estimates of Pielou's test statistic for various sample size combinations, and the density 
of the corresponding asymptotic distribution. Then, we report the means and variances of the test statistic for each 
sample size combination, and suggest a transformation for the test statistic based on these means and variances. 



In Figure [TTl we plot the kernel density estimates for Pielou's test statistic obtained for each sample size combination 
and the density plot of the Xi-distribution. Note that the discrepancy between the density plot of Xi-distribution and 
the kernel density estimates around is because of the kernel smoothing in density estimation. Otherwise, for larger 
— than — values, kernel density estimates follow the trend of a distribution, but perhaps requires an adjustment for 
location and scale. Notice also that, the kernel density curves are smaller for balanced (i.e., similar) sample sizes and 
larger for unbalanced (i.e., very different) sample sizes compared to the pdf of Xi-distribution. 



Let MIA'p] be the sample mean and V[A'p] be the sample variance of the calculated X% values. We present the 
empirical means and variances of Pielou's and Dixon's test statistics for each sample size combination in Table 1121 



which suggests that M[A'^] ^ 1.63 and Y[Xp] « 5.40. 



Since, the critical values based on Xi-distribution is used for 



Pielou's test, it is desirable to have the corrected scores to be approximately distributed as Xi- 



We transform the Xp 



scores by adjusting for location and scaling as A'p „ 



Xp 



■ 7p 



so that E 



Xp 



1 and Var 



■ 7p 



would hold. Such a transformation will convert the Xp values into a variable approximately distributed as xi- 



a 2 
Using 
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